A AfterWork 


Data Analysis with SQL Reading 


While working with data, you'll often be required to query data from various data sources. As 
mentioned before, it’s also likely that you'll have to work with the SQL language. Below are 
some of the concepts that you will put into use while querying. This will not be limited to 
performing statistical operations for data analysis and reporting. 


Filtering data with SQL 
e Sometimes, SQL queries return lots of data you don't need. One of the most powerful 
features of a database is the ability to filter data, i.e., to select only those records that 
match certain criteria. For example, suppose we want to see when a particular site was 
visited. 
e We can perform filtering by selecting these records from the Visited table by using a 
WHERE clause in our query: 


SELECT * FROM Visited WHERE site = 'Humo'; 


Views 
e Aview is a virtual table, that contains rows and columns just like in actual tables found in 
a database. Views are created by selecting fields from one or more tables present in the 
database. 
e Views are created because of some of the following reasons: 
o Simplicity: to hide and reuse complex queries. 
o Column Name Simplication: to provide aliases on column names to make them 
more memorable and/or meaningful. 
o Security: To limit access to data in such a way that a user can see and modify 
exactly what they need and no more. 
Joins: To join and simplify multiple tables into a single virtual table. 
Summary: Act as aggregated tables, where data is aggregated (sum, average 
etc) and the outcome is presented. 
e Below is an example of how can create a view: 
CREATE VIEW House AS 
SELECT house_no, address 
FROM homes WHERE lane < 5; 


Sorting with SQL 


e The ORDER BY statement in SQL is used to sort the fetched data in either ascending or 
descending according to one or more columns. By default ORDER BY sorts the data in 
ascending order. We can use the keyword DESC to sort the data in descending order 
and the keyword ASC to sort in ascending order. 

e Syntax of all ways of using ORDER BY is shown below: 

o Sort according to one column: To sort in ascending or descending order we can 
use the keywords ASC or DESC respectively. 

o Sort according to multiple columns: To sort in ascending or descending order we 
can use the keywords ASC or DESC respectively. To sort according to multiple 
columns, separate the names of columns by the (,) operator. 

e Example of sorting to multiple columns: 


SELECT * FROM table_name ORDER BY column1 ASC|DESC , column2 
ASC|DESC; 


Grouping with SQL 
e The GROUP BY statement is used to arrange identical data into groups with the help of 
some functions. i.e if a particular column has the same values in different rows then it will 
arrange these rows in a group. 
e While grouping we should note: 
o GROUP BY clause is used with the SELECT statement. 
o In the query, GROUP BY clause is placed after the WHERE clause. 
o In the query, GROUP BY clause is placed before ORDER BY clause if used any. 


Data formatting with SQL 

e We can also perform data formatting while selecting our data through the use of the 
round() and cast() functions. 

e The ROUND() function is used to round a numeric field to the number of decimals 
specified. We provide the number as an argument, as well as the length you'd like the 
number to be rounded to. The function also accepts an optional third argument that 
allows you to specify whether the number is rounded or truncated. 

e Note: Many database systems do rounding differently than you might expect. We should 
always investigate our output before making any conclusions. 

e The following SQL statement selects the product name and rounds the price in the 
"Products" table: 


SELECT ProductName, ROUND(Price,0) AS RoundedPrice FROM Products; 


The CAST() function converts a value (of any type) into a specified datatype. 
Converting a value to an int data type below: 


SELECT CAST(34.23 AS int); 


Basic Statistical Techniques 


Statistics are very useful as an initial stage of more in-depth analysis, i.e. for data 
overview and data quality assessment. 

Here's a list of statistical functions (as well as some standard arithmetical functions that 
are useful in calculating statistics) and their meanings. 

AVG(): returns the mean 

COUNT(): returns the population (or sample, depending on the row source) 
SQUARE(n): returns the square of the value specified 

POWER (a,n): returns the value of a to the nth power 

SQRT(n): returns the square root of n 

SUM(): returns the sum of the values in a set 

STDDEV(): returns the standard deviation of a sample 

STDDEVP(): returns the standard deviation of a population 

VAR(): returns the variance of a sample 

VARP(): returns the variance of a population 
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